You can follow along by importing the Livebook from here: https://gist.github.com/seanmor5/dc077ea5dcc44f6e9d4fbfb34d834552
您可以通过从此处导入 Livebook 来跟进:https://gist.github.com/seanmor5/dc077ea5dcc44f6e9d4fbfb34d834552
Preface 前言
NOTE: Before reading, I highly recommend checking out my first post which serves as an introduction to Nx and the Nx ecosystem.
注意:在阅读之前,我强烈建议您查看我的第一篇文章,该文章介绍了 Nx 和 Nx 生态系统。
This XKCD was published in September of 2014, roughly two years following deep learning’s watershed moment – AlexNet. At the time, deep learning was still in its nascent stages, and classifying images of “bird or no bird” might have seemed like an impossible task. Today, thanks to neural networks, we can “solve” this task in roughly 30 minutes – which is precisely what we’ll do with Elixir and Axon.
这份 XKCD 于 2014 年 9 月发布,距深度学习的分水岭时刻 —— AlexNet 大约过去了两年。当时,深度学习仍处于初级阶段,对“有鸟或无鸟”的图像进行分类似乎是一项不可能完成的任务。今天,多亏了神经网络,我们可以在大约 30 分钟内“解决”这个任务 —— 这正是我们将使用 Elixir 和 Axon 所做的。
NOTE: To say the problem of computer vision is “solved” is debatable. While we’re able to achieve incredible performance on image classification, image segmentation, object detection, etc., there are still many open problems in the field. Models still fail in hilarious ways. In this context, “solved” really means suitable accuracy for the purposes of this demonstration.
注意:说计算机视觉问题“已解决”是值得商榷的。虽然我们能够在图像分类、图像分割、目标检测等方面取得令人难以置信的性能,但该领域仍然存在许多未解决的问题。模型仍然以滑稽的方式失败。在这种情况下,“已解决”实际上意味着适合本演示目的的准确性。
Introduction 介绍
Axon is a library for creating neural networks for the Elixir programming language. The library is built entirely on top of Nx, which means it can be combined with compilers such as EXLA to accelerate programs with “just-in-time” (JIT) compilation to the CPU, GPU, or TPU. What does that actually mean? Somebody has taken care of the hard work for us! In order to take advantage of our hardware, we need optimized and specialized kernels. Fortunately, Nx and EXLA will take care of generating these kernels for us (by delegating them to another compiler). We can focus on our high-level implementation, and not the low-level details.
Axon 是一个用于为 Elixir 编程语言创建神经网络的库。该库完全构建在 Nx 之上,这意味着它可以与 EXLA 等编译器结合使用,以通过“即时”(JIT) 编译将程序加速到 CPU、GPU 或 TPU。这到底是什么意思?有人替我们解决了辛苦的工作!为了利用我们的硬件,我们需要优化和专门的内核。幸运的是, Nx 和 EXLA 会负责为我们生成这些内核(通过将它们委托给另一个编译器)。我们可以专注于我们的高级实现,而不是低级细节。
You don’t need to understand the intricacies of GPU programming or optimized mathematical routines to train real and practical neural networks.
您无需了解 GPU 编程的复杂性或优化的数学例程即可训练真实实用的神经网络。
What is a neural network?
什么是神经网络?
A neural network is really just a function which maps inputs to outputs:
神经网络实际上只是一个将输入映射到输出的函数:
- Pictures of cats and dogs -> Label
catordog
猫狗图片->标签cat或dog - Lot size, square footage, # of bedrooms, # of bathrooms -> Housing Price
地块面积、平方英尺、卧室数量、浴室数量 -> 房价 - Movie Review ->
positiveornegativerating
电影评论 ->positive或negative评分
The “magic” is what happens during the transformation of input data to output label. Imagine a cohesive team of engineers solving problems. Each engineer brings their own unique perspective to a problem, applies their expertise, and their efforts are coordinated with the group in a meaningful way to deliver an excellent product. This coordinated effort is analogous to the coordinated effort of layers in a neural network. Each layer learns it’s own representation of the input data, which is then given to the next layer, and the next layer, and so on until we’re left with a meaningful representation:
“魔法”是在将输入数据转换为输出标签的过程中发生的事情。想象一个有凝聚力的工程师团队解决问题。每个工程师都对问题提出自己独特的观点,运用他们的专业知识,并且他们的努力以有意义的方式与团队协调,以交付出色的产品。这种协调努力类似于神经网络中各层的协调努力。每一层都学习它自己对输入数据的表示,然后将其提供给下一层,再下一层,依此类推,直到我们得到有意义的表示:
In the diagram above, you’ll notice that information flows forward. Occasionally, you’ll hear the term feed-forward networks which is derived from the fact that information flows forward in a neural network.
在上图中,您会注意到信息是向前流动的。有时,您会听到术语前馈网络,它源于信息在神经网络中向前流动的事实。
Successive transformations in a neural network are typically referred to as layers. Mathematically, a layer is just a function:
神经网络中的连续变换通常称为层。在数学上,层只是一个函数:
Where function 1 and function 2 are layers. For those who like to read code more than equations, the transformations essentially boil down to the following Elixir code:
其中函数 1 和函数 2 是层。对于那些喜欢阅读代码而不是方程式的人来说,这些转换基本上可以归结为以下 Elixir 代码:
def f(x, parameters) do
x
|> f_2(parameters)
|> f_1(parameters)
end
def f_1(x, parameters) do
{w1, b1, _, _} = parameters
x * w1 + b1
end
def f_2(x, parameters) do
{_, _, w2, b2} = parameters
x * w2 + b2
end
In the diagram, you’ll also notice the term activation function. Activation functions are nonlinear element-wise functions which scale layer outputs. You can think of them as “activating” or highlighting important information as it propagates through the network. With activation functions, our simple two layer neural network starts to look something like:
在图中,您还会注意到术语激活函数。激活函数是缩放层输出的非线性逐元素函数。您可以将它们视为“激活”或突出显示通过网络传播的重要信息。使用激活函数,我们简单的两层神经网络开始看起来像这样:
def f(x, parameters) do
x
|> f_2(parameters)
|> activation_2()
|> f_1(parameters)
|> activation_1()
end
def f_1(x, parameters) do
{w1, b1, _, _} = parameters
x * w1 + b1
end
def activation_1(x) do
sigmoid(x)
end
def f_2(x, parameters) do
{_, _, w2, b2} = parameters
x * w2 + b2
end
def activation_2(x) do
sigmoid(x)
end
The “learning” in “deep learning” comes in learning parameters defined in the above functions which effectively solve a given task. As noted in the beginning of this post, “solve” is a relative term. Neural networks trained on separate tasks will have entirely different success criteria.
“深度学习”中的“学习”来自学习上述函数中定义的 parameters ,有效解决给定任务。正如本文开头所述,“解决”是一个相对术语。在不同任务上训练的神经网络将有完全不同的成功标准。
The learning process is commonly referred to as training. Neural networks are typically trained using gradient descent. Gradient descent optimizes the parameters of a neural network to minimize a loss function. A loss function is essentially the success criteria you define for your problem.
学习过程通常称为训练。神经网络通常使用梯度下降进行训练。梯度下降优化神经网络的参数以最小化损失函数。损失函数本质上是您为问题定义的成功标准。
Given a task, we can essentially boil down the process of creating and training neural networks to:
给定一个任务,我们基本上可以将创建和训练神经网络的过程归结为:
- Gather, explore, normalize the data 收集、探索、规范化数据
- Define the model 定义模型
- Define success criteria (loss function) 定义成功标准(损失函数)
- Define the training process (optimizer) 定义训练过程(优化器)
- Instrument with metrics, logging, etc. 具有指标、日志记录等的仪器
- Go! 干!
Axon makes steps 2-6 quick and easy – so much so that most of your time should be spent on step 1 with the data. For the rest of this post, we’ll walk through an example workflow in Axon, and see how easy it is to create and train a neural network from scratch.
Axon 使步骤 2-6 变得快速和简单 —— 以至于您的大部分时间应该花在处理数据的步骤 1 上。对于本文的其余部分,我们将通过 Axon 中的示例工作流程,看看从头开始创建和训练神经网络是多么容易。
Requirements 要求
To start, we’ll need to install some prerequisites. For this example, we’ll use Axon, Nx, and EXLA to take care of our data processing and neural network training. We’ll use Flow for creating a simple IO input pipeline. Pixel will help us decode our raw JPEGs and PNGs to tensors. Finally, Kino will allow us to render some of our data for analysis.
首先,我们需要安装一些先决条件。对于此示例,我们将使用 Axon 、 Nx 和 EXLA 来处理我们的数据处理和神经网络训练。我们将使用 Flow 创建一个简单的 IO 输入管道。 Pixel 将帮助我们将原始 JPEG 和 PNG 解码为张量。最后, Kino 将允许我们呈现一些数据以供分析。
Additionally, we’ll set the default Defn compiler to EXLA. This will ensure all of our functions in Axon run using the XLA compiler. This really just means they’ll run much faster than they would in pure Elixir.
此外,我们将默认的 Defn 编译器设置为 EXLA 。这将确保我们在 Axon 中的所有功能都使用 XLA 编译器运行。这实际上只是意味着它们将比纯 Elixir 中的运行速度快得多。
Mix.install([
{:axon, "~> 0.1.0-dev", github: "elixir-nx/axon"},
{:exla, "~> 0.1.0-dev", github: "elixir-nx/nx", sparse: "exla"},
{:nx, "~> 0.1.0-dev", github: "elixir-nx/nx", sparse: "nx", override: true},
{:flow, "~> 1.1.0"},
{:pixels, "~> 0.2.0"},
{:kino, "~> 0.3.1"}
])
Nx.Defn.default_options(compiler: EXLA)
The Data 数据
Our goal is to differentiate between images of birds and not birds. While in a practical setting we’d probably want to include images of nature and other settings birds are found in for our negative examples, in this example we’ll use pictures of cats. Cats are definitely not birds.
我们的目标是区分鸟类图像和非鸟类图像。虽然在实际环境中,我们可能希望包括自然图像和鸟类在我们的负面示例中发现的其他设置,但在这个示例中,我们将使用猫的图片。猫绝对不是鸟。
For images of birds, we’ll use Caltech-UCSD Birds 2011 which is an open-source dataset consisting of around 11k images of various birds. For images of cats, we’ll use Cats vs. Dogs which is a dataset consisting of around 25k images of cats and dogs. The rest of this post will assume this data is downloaded locally.
对于鸟类图像,我们将使用 Caltech-UCSD Birds 2011,这是一个开源数据集,包含大约 11k 种鸟类的图像。对于猫的图像,我们将使用 Cats vs. Dogs,这是一个包含大约 25k 猫和狗图像的数据集。这篇文章的其余部分将假定此数据是在本地下载的。
Let’s start by getting an idea of what we’re working with:
让我们首先了解我们正在使用的是什么:
cats = "PetImages/Cat/*.jpg"
birds = "CUB_200_2011/images/*/*.jpg"
num_cats =
cats
|> Path.wildcard()
|> Enum.count()
num_birds =
birds
|> Path.wildcard()
|> Enum.count()
IO.write("Number of cats: #{num_cats}, Number of birds: #{num_birds}")
Fortunately, our dataset is relatively balanced. In total we have around 25000 images. This is a little on the low side for most practical deep learning problems. Both data quantity and data quality have a large impact on the performance of your neural networks. For our example here the data will suffice, but for practical purposes you’d want to conduct a full data exploration and analysis before diving in.
幸运的是,我们的数据集相对平衡。我们总共有大约 25000 张图像。对于大多数实际的深度学习问题来说,这有点偏低。数据量和数据质量都会对神经网络的性能产生很大影响。对于我们此处的示例,数据就足够了,但出于实际目的,您需要在深入研究之前进行完整的数据探索和分析。
We can use Kino to get an idea of what examples in our dataset look like:
我们可以使用 Kino 来了解数据集中的示例是什么样的:
cats
|> Path.wildcard()
|> Enum.random()
|> File.read!()
|> Kino.Image.new("image/jpeg")
birds
|> Path.wildcard()
|> Enum.random()
|> File.read!()
|> Kino.Image.new("image/jpeg")
One thing you might notice is that our images are not normalized in terms of height and width. Axon requires all images to have the same height, width, and number of color channels. In order to train and run our neural network, we’ll need to process each image into the same dimensions.
您可能会注意到的一件事是我们的图像在高度和宽度方面没有标准化。 Axon 要求所有图像具有相同的高度、宽度和颜色通道数。为了训练和运行我们的神经网络,我们需要将每张图像处理成相同的维度。
Additionally, our images are encoded as PNGs and JPEGs. Axon only works with tensors, so we’ll need to read each image into a tensor before we can use it. We can do this using Pixel and a sprinkle of Nx. First, let’s see how we can go from image to tensor:
此外,我们的图像被编码为 PNG 和 JPEG。 Axon 仅适用于张量,因此我们需要先将每张图像读入张量,然后才能使用它。我们可以使用 Pixel 和一些 Nx 来做到这一点。首先,让我们看看如何从图像到张量:
{:ok, image} =
cats
|> Path.wildcard()
|> Enum.random()
|> Pixels.read_file()
%{data: data, height: height, width: width} = image
data
|> Nx.from_binary({:u, 8})
|> Nx.reshape({4, height, width}, names: [:channels, :height, :width])
Nx encodes images as values at each pixel. By default, Pixels decodes images in RGBA format. So, for each pixel in an image with shape {height, width}, we have 4 8-bit integer values: red, green, blue, and alpha (opacity). Pixels conveniently gives us a binary of pixel data, and the height and width of the image. So we can create a tensor using Nx.from_binary/2 and then reshape to the correct input shape using Nx.reshape/2.
Nx 将图像编码为每个像素的值。默认情况下, Pixels 解码 RGBA 格式的图像。因此,对于形状为 {height, width} 的图像中的每个像素,我们有 4 个 8 位整数值: red 、 green 、 blue 和 alpha (不透明度)。 Pixels 方便地给了我们像素数据的二进制,以及图像的高度和宽度。所以我们可以使用 Nx.from_binary/2 创建张量,然后使用 Nx.reshape/2 重塑为正确的输入形状。
When working with images, it’s common to normalize pixel values to fall between 0 and 1. This helps stabilize the training of neural networks (most parameters are initialized to a value between 0 and 0.06). To do this, we can simply divide our image by 255:
在处理图像时,通常会将像素值标准化为介于 0 和 1 之间。这有助于稳定神经网络的训练(大多数参数初始化为 0 和 0.06 之间的值)。为此,我们可以简单地将图像除以 255 :
data
|> Nx.from_binary({:u, 8})
|> Nx.reshape({4, height, width}, names: [:channels, :height, :width])
|> Nx.divide(255.0)
Now, let’s take some of this exploration and turn it into a legitimate input pipeline.
现在,让我们进行一些探索并将其转变为合法的输入管道。
Input Pipeline 输入管道
Now that we know how to get a tensor from an image, we can go about constructing the input pipeline. In this example, our pipeline will just be an Elixir Stream. In most machine learning applications, datasets will be too large to load entirely into memory. Instead, we want to construct an efficient pipeline of data preprocessing and normalization which runs in parallel with model training. For example, we can train our models entirely on the GPU, and process new data at the same time on the CPU.
现在我们知道如何从图像中获取张量,我们可以着手构建输入管道。在此示例中,我们的管道将只是一个 Elixir Stream 。在大多数机器学习应用程序中,数据集太大而无法完全加载到内存中。相反,我们希望构建一个与模型训练并行运行的高效数据预处理和规范化管道。例如,我们可以完全在 GPU 上训练我们的模型,同时在 CPU 上处理新数据。
Right now, we can retrieve our data as paths to separate image directories. We’ll start by labeling images in respective directories, shuffling the input data, and then splitting it into train, validation, and test sets:
现在,我们可以将数据检索为单独图像目录的路径。我们将从在各自目录中标记图像开始,打乱输入数据,然后将其拆分为训练集、验证集和测试集:
cats_path_and_label =
cats
|> Path.wildcard()
|> Enum.map(&{&1, 0})
birds_path_and_label =
birds
|> Path.wildcard()
|> Enum.map(&{&1, 1})
image_path_and_label = cats_path_and_label ++ birds_path_and_label
num_examples = Enum.count(image_path_and_label)
num_train = floor(0.8 * num_examples)
num_val = floor(0.2 * num_train)
{train, test} =
image_path_and_label
|> Enum.shuffle()
|> Enum.split(num_train)
{val, train} =
train
|> Enum.split(num_val)
This sort of dataset division is common when training neural networks. Each separate dataset serves a separate purpose:
这种数据集划分在训练神经网络时很常见。每个单独的数据集都有一个单独的目的:
- Train set - consists of examples that the network explicitly trains on. This should be the largest portion of your dataset. Typically 70-90% depending on dataset size.
训练集 - 由网络明确训练的示例组成。这应该是数据集的最大部分。通常为 70-90%,具体取决于数据集大小。 - Validation set - consists of examples that are used to evaluate the model during training. They provide a means of monitoring the model for overfitting as the examples in the validation set are not explicitly trained on. Typically is a small percentage of the train set.
验证集 - 包含用于在训练期间评估模型的示例。它们提供了一种监控模型过度拟合的方法,因为验证集中的示例未明确接受过训练。通常是火车集的一小部分。 - Test set - consists of examples which are unseen during training and validation which are used to validate the trained model’s performance.
测试集 - 由训练和验证期间未见的示例组成,用于验证训练模型的性能。
As you’ll see, Axon makes it easy to create training and evaluation pipelines which make use of all of these datasets.
正如您将看到的,Axon 可以轻松创建使用所有这些数据集的训练和评估管道。
Next, we’ll create a function which returns a stream given a list of image paths and labels. Our stream should:
接下来,我们将创建一个函数,它返回一个给定图像路径和标签列表的流。我们的流应该:
- Parse the given image path into a tensor or filter bad images
将给定的图像路径解析为张量或过滤不良图像 - Pad or crop image to a fixed size
将图像填充或裁剪为固定大小 - Rescale the image pixel values between 0 and 1
在 0 和 1 之间重新缩放图像像素值 - Batch input images
批量输入图片
A batch is just a collection of training examples. In theory, we’d want to update a neural network’s parameters using the gradient of each parameter with respect to the model’s loss over the entire training dataset. In reality, most datasets are far too large for this. Instead, we update models incrementally on batches of training data. One full pass of batches through the entire dataset is called an epoch.
批次只是训练示例的集合。理论上,我们希望使用每个参数相对于模型在整个训练数据集上的损失的梯度来更新神经网络的参数。实际上,大多数数据集对于这个来说都太大了。相反,我们在批量训练数据上逐步更新模型。通过整个数据集的一次完整批次称为一个纪元。
In this example, we’ll batch images into batches of 32. The choice of batch size is arbitrary; however, it’s common to use batch sizes which are multiples of 32, e.g. 32, 64, 128, etc.
在这个例子中,我们将图像分成 32 个批次。批量大小的选择是任意的;然而,通常使用 32 的倍数的批量大小,例如32、64、128 等
max_height = 32
max_width = 32
batch_size = 32
resize_dimension = fn tensor, dim, limit ->
axis_size = Nx.axis_size(tensor, dim)
cond do
axis_size == limit ->
tensor
axis_size < limit ->
pad_val = 0
pad_top = :rand.uniform(limit - axis_size)
pad_bottom = limit - (axis_size + pad_top)
rank = Nx.rank(tensor) - 1
pads = for i <- 0..rank, do: if(i == dim, do: {0, 0, 0}, else: {pad_top, pad_bottom, 0})
Nx.pad(tensor, pad_val, pads)
:otherwise ->
slice_start = :rand.uniform(axis_size - limit)
slice_length = limit
Nx.slice_axis(tensor, slice_start, slice_length, dim)
end
end
resize_and_rescale = fn image ->
image
|> resize_dimension.(:height, max_height)
|> resize_dimension.(:width, max_width)
|> Nx.divide(255.0)
end
pipeline = fn paths ->
paths
|> Flow.from_enumerable()
|> Flow.flat_map(fn {path, label} ->
case Pixels.read_file(path) do
{:error, _} ->
[:error]
{:ok, image} ->
%{data: data, height: height, width: width} = image
tensor =
data
|> Nx.from_binary({:u, 8})
|> Nx.reshape({4, height, width}, names: [:channels, :height, :width])
[{tensor, label}]
end
end)
|> Stream.reject(fn
:error -> true
_ -> false
end)
|> Stream.map(fn {img, label} ->
{Nx.Defn.jit(resize_and_rescale, [img]), label}
end)
|> Stream.chunk_every(32, 32, :discard)
|> Stream.map(fn imgs_and_labels ->
{imgs, labels} = Enum.unzip(imgs_and_labels)
{Nx.stack(imgs), Nx.new_axis(Nx.stack(labels), -1)}
end)
end
Let’s break down this input pipeline a little more. First, we create a function which resizes input images to have a max height and max width of 32. You can make your images larger, but this will consume more memory and make the training process a little bit slower. You might see a slight boost in final accuracy as the image retains more of the original image’s information. Our random crop or pad function is actually pretty bad in terms of performance. This is because libraries such as XLA do really poorly with dynamic input shapes. A better solution would be to make use of dedicated image manipulation routines such as those in OpenCV. For this example, our solution will suffice.
让我们再分解一下这个输入管道。首先,我们创建一个函数来调整输入图像的大小,使其最大高度和最大宽度为 32 。您可以使图像更大,但这会消耗更多内存并使训练过程稍微慢一些。您可能会看到最终精度略有提高,因为图像保留了更多原始图像的信息。我们的随机裁剪或填充功能实际上在性能方面非常糟糕。这是因为像 XLA 这样的库在动态输入形状方面做得很差。更好的解决方案是使用专用的图像处理程序,例如 OpenCV 中的程序。对于这个例子,我们的解决方案就足够了。
Next, we define our pipeline using Flow. Flow will apply our image reading and decoding routine concurrently to our list of input paths. Pixel takes care of the work of actually decoding our images into binary data. Unfortunately, some of the images in our dataset are corrupted. Thus, we need to mark these with :error and throw them out before we attempt to train with them.
接下来,我们使用 Flow 定义管道。 Flow 会将我们的图像读取和解码例程同时应用于我们的输入路径列表。 Pixel 负责将我们的图像实际解码为二进制数据的工作。不幸的是,我们数据集中的一些图像已损坏。因此,我们需要用 :error 标记它们并在我们尝试用它们训练之前将它们扔掉。
Next we apply our resizing method with Nx.Defn.jit. Nx.Defn.jit uses the default compiler options to explicitly JIT compile a function. Typically, we’d define the functions we want to accelerate within a module as defn; however, we can also define anonymous functions and explicitly JIT compile them this way. After we have our images as tensors, we group adjacent examples into groups of 32 and “stack” them on top of each other. Our final stream will return tensors of shapes {32, 4, 32, 32} and {32, 1} in a lazy manner. This will ensure we don’t load every image into memory at once, but instead load them as we need them. We can use our pipeline function to create pipelines from the splits we defined previously:
接下来我们使用 Nx.Defn.jit 应用调整大小的方法。 Nx.Defn.jit 使用默认编译器选项显式 JIT 编译一个函数。通常,我们会将模块中要加速的功能定义为 defn ;然而,我们也可以定义匿名函数并以这种方式显式地进行 JIT 编译。在我们将图像作为张量后,我们将相邻的示例分成 32 个一组,并将它们“堆叠”在彼此之上。我们的最终流将以惰性方式返回形状为 {32, 4, 32, 32} 和 {32, 1} 的张量。这将确保我们不会一次将每个图像加载到内存中,而是在需要时加载它们。我们可以使用我们的 pipeline 函数从我们之前定义的拆分创建管道:
train_data = pipeline.(train)
val_data = pipeline.(val)
test_data = pipeline.(test)
With our pipelines created, it’s time to create our model!
创建我们的管道后,是时候创建我们的模型了!
The Model 该模型
Before we can train a model, we need a model to train! Axon makes the process of creating neural networks easy with its model creation API. Axon defines the layers of a neural network as composable functions. Each function returns an Axon struct which retains information about the model for use during initialization and prediction. The model we’ll define here is known as a convolutional neural network. It’s a special kind of neural network used mostly in computer vision tasks.
在我们训练模型之前,我们需要一个模型来训练! Axon 通过其模型创建 API 简化了创建神经网络的过程。 Axon 将神经网络的层定义为可组合函数。每个函数都返回一个 Axon 结构,它保留有关模型的信息以供在初始化和预测期间使用。我们将在此处定义的模型称为卷积神经网络。它是一种特殊的神经网络,主要用于计算机视觉任务。
All Axon models start with an explicit input definition. This is necessary because successive layer parameters depend specifically on the input shape. You are allowed to define one dimension as nil, representing a variable batch size. Our images are in batches of 32 with 4 color channels in a 32x32 image. Thus, our input shape is {nil, 4, 32, 32}. Following the input definition, you define each successive layer. You can essentially read the model from the top down as a series of transformations.
所有 Axon 模型都以显式输入定义开头。这是必要的,因为连续的层参数具体取决于输入形状。您可以将一个维度定义为 nil ,表示可变的批量大小。我们的图像以 32 x 32 图像的 4 个颜色通道为一组,每批 32 个。因此,我们的输入形状是 {nil, 4, 32, 32} 。在输入定义之后,您定义每个连续的层。您基本上可以将模型作为一系列转换从上到下阅读。
model =
Axon.input({nil, 4, 32, 32})
|> Axon.conv(32, kernel_size: {3, 3})
|> Axon.batch_norm()
|> Axon.relu()
|> Axon.max_pool(kernel_size: {2, 2})
|> Axon.conv(64, strides: [2, 2])
|> Axon.batch_norm()
|> Axon.relu()
|> Axon.max_pool(kernel_size: {2, 2})
|> Axon.conv(32, kernel_size: {3, 3})
|> Axon.batch_norm()
|> Axon.relu()
|> Axon.global_avg_pool()
|> Axon.dense(1, activation: :sigmoid)
Notice how Axon gives us a nice table which shows how each layer transforms the model input, as well as the number of parameters in each layer and in the model as a whole. This is a high-level summary of the model, and can be useful for debugging intermediate shape issues and for determining the size of a given model.
请注意 Axon 是如何为我们提供一个漂亮的表格的,该表格显示了每一层如何转换模型输入,以及每一层和整个模型中的参数数量。这是模型的高级摘要,可用于调试中间形状问题和确定给定模型的大小。
For this post, we’ll gloss over the details of what each layer does and how it helps the neural network learn good representations of the input data. However, one thing that is important to note is the final sigmoid layer. Our problem is a binary classification problem. That means we want to classify images in one of two classes: bird or not bird. Because of this, we want our neural network to predict a probability between 0 and 1. Probabilities closer to 1 indicate a higher confidence that an example is a bird. Probabilities closer to 0 represent a lower confidence in an example being a bird. sigmoid is a function which always returns a value between 0 and 1. Thus, it will return the probability we’re looking for.
对于这篇文章,我们将略过每一层的作用的细节,以及它如何帮助神经网络学习输入数据的良好表示。但是,需要注意的重要一件事是最后的 sigmoid 层。我们的问题是二元分类问题。这意味着我们要将图像分类为两个类别之一: bird 或 not bird 。因此,我们希望我们的神经网络预测 0 和 1 之间的概率。接近 1 的概率表示样本是鸟的置信度更高。接近 0 的概率表示对鸟的例子的信心较低。 sigmoid 是一个始终返回 0 和 1 之间的值的函数。因此,它将返回我们正在寻找的概率。
Training Day 训练日
Now that we’ve defined the network, it’s time to define the training process! Axon abstracts the training and evaluation process into a unified Loop API. Training and evaluation are really just loops which carry state over some dataset. Axon takes away as much of the boilerplate of writing these loops away as possible.
现在我们已经定义了网络,是时候定义训练过程了! Axon 将训练和评估过程抽象为统一的 Loop API。训练和评估实际上只是在某些数据集上传递状态的循环。 Axon 尽可能多地去除编写这些循环的样板。
In order to define a training loop, we start from the Axon.Loop.trainer/4 factory method. This creates a Loop struct with some pre-populated fields specific to model training. Axon.Loop.trainer/4 takes four parameters:
为了定义训练循环,我们从 Axon.Loop.trainer/4 工厂方法开始。这将创建一个 Loop 结构,其中包含一些特定于模型训练的预填充字段。 Axon.Loop.trainer/4 有四个参数:
- The model - this is the model we want to train
模型 —— 这是我们要训练的模型 - The loss - this is our training objective
损失 —— 这是我们的训练目标 - The optimizer - this is how we will train
优化器 —— 这就是我们训练的方式 - Options - miscellaneous options
选项 - 杂项选项
We’ve already defined our model. In this example, we’ll use the binary_cross_entropy loss function. This is the loss function you’ll want to use with most binary classification tasks. Our optimizer is the adam optimizer. Adam is a variant of gradient descent which works pretty well for most tasks. Finally, we specify the log option to tell our trainer to log training output on every iteration.
我们已经定义了我们的模型。在这个例子中,我们将使用 binary_cross_entropy 损失函数。这是您希望在大多数二元分类任务中使用的损失函数。我们的优化器是 adam 优化器。 Adam 是梯度下降的一种变体,适用于大多数任务。最后,我们指定 log 选项来告诉我们的训练器在每次迭代时记录训练输出。
After creating a loop, it’s necessary to instrument it with metrics and handlers. metrics are anything you want to track during training. For example, we want to keep track of our model’s accuracy during training. Accuracy is a bit more readily interpretable than loss, so this will help us ensure that our model is actually training. handlers take place on specific events. For example, logging is actually implemented as a handler which runs after each batch. In this example, we’ll call the validate handler which will run a validation loop at the end of each epoch. Our validation loop will let us know if our model is overfitting on the training data.
创建循环后,有必要使用 metrics 和 handlers 对其进行检测。 metrics 是您在训练期间想要跟踪的任何内容。例如,我们想在训练期间跟踪模型的准确性。准确性比损失更容易解释,因此这将帮助我们确保我们的模型确实在训练。 handlers 发生在特定事件上。例如,日志记录实际上是作为在每个批次之后运行的处理程序来实现的。在此示例中,我们将调用 validate 处理程序,它将在每个纪元结束时运行验证循环。我们的验证循环会让我们知道我们的模型是否对训练数据过度拟合。
Finally, after creating and instrumenting our loop, we need to run it. Axon.Loop.run/3 takes the actual loop we want to run, the input data we want to loop over, and some loop-specific options. In this example, we’ll have our loop run for a total of 5 epochs. That means we will run our loop a total of 5 full times through the training data (note this will take upwards of 20 minutes to complete depending on the capabilities of your machine):
最后,在创建并检测我们的循环之后,我们需要运行它。 Axon.Loop.run/3 采用我们要运行的实际循环、我们要循环的输入数据以及一些特定于循环的选项。在这个例子中,我们将循环运行总共 5 个时期。这意味着我们将通过训练数据运行我们的循环总共 5 次(请注意,这将需要 20 分钟以上才能完成,具体取决于您机器的功能):
model_state =
model
|> Axon.Loop.trainer(:binary_cross_entropy, :adam, log: 1)
|> Axon.Loop.metric(:accuracy)
|> Axon.Loop.validate(model, val_data)
|> Axon.Loop.run(train_data, epochs: 5)
Notice how our model incrementally improves epoch over epoch. The output of our training loop is the trained model state. We can use this to evaluate our model on our test set. In a practical setting, you’d want to save this state for use in production.
请注意我们的模型是如何逐步改进一个时代的。我们训练循环的输出是训练后的模型状态。我们可以使用它来评估我们的测试集上的模型。在实际设置中,您希望保存此状态以用于生产。
Did we need a research team and 5 years?
我们需要一个研究团队和 5 年吗?
The process of writing an evaluation loop is very similar to the process of writing a training loop in Axon. We start from a factory, in this case Axon.Loop.evaluator/2. This function takes the model we want to evaluate, and the trained model state.
编写评估循环的过程与在 Axon 中编写训练循环的过程非常相似。我们从工厂开始,在本例中为 Axon.Loop.evaluator/2 。此函数采用我们要评估的模型和经过训练的模型状态。
Next, we instrument again with metrics and handlers.
接下来,我们再次使用指标和处理程序进行检测。
Finally, we run – this time on our test set:
最后,我们运行 —— 这次是在我们的测试集上:
model
|> Axon.Loop.evaluator(model_state)
|> Axon.Loop.metric(:accuracy)
|> Axon.Loop.run(test_data)
Our model finished with 78% accuracy, which means we were able to differentiate between birds and not birds about 78% of the time. Considering this took us under an hour to do, I would say that’s pretty incredible progress!
我们的模型以 78% 的准确率结束,这意味着我们能够在大约 78% 的时间内区分鸟类和非鸟类。考虑到这花了我们不到一个小时的时间,我想说这是一个非常令人难以置信的进步!
Conclusion 结论
While this post glossed over many of the very specific details of how neural networks work, I hope it demonstrated the power of neural networks to perform well on what we once perceived to be very challenging machine learning problems. Additionally, I hope this post inspired you to take a deeper look into the field of deep learning and specifically into Axon.
虽然这篇文章掩盖了神经网络如何工作的许多非常具体的细节,但我希望它展示了神经网络在我们曾经认为非常具有挑战性的机器学习问题上表现出色的能力。此外,我希望这篇文章能启发您更深入地了解深度学习领域,尤其是 Axon。
In future posts, we’ll take a much closer look at the details and math underpinning neural networks and training neural networks, at how Axon makes use of Nx under the hood, and at some more specific problems that you can use Axon to solve. If you’re interested in learning more about Axon or Nx, be sure to check out the Elixir Nx Organization, come chat with us in the EEF ML Working Group Slack, or ask questions on the Nx Elixir Forum.
在以后的帖子中,我们将更仔细地研究支持神经网络和训练神经网络的细节和数学, Axon 如何在幕后使用 Nx ,以及您可以使用的一些更具体的问题轴突解决。如果您有兴趣了解有关 Axon 或 Nx 的更多信息,请务必查看 Elixir Nx 组织,在 EEF ML 工作组 Slack 中与我们聊天,或在 Nx Elixir 论坛上提问。
Until next time! 直到下一次!






